Skip to content

BUG: GH4080, int conversion of underlying series to float needs updating in parent frame #4081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jun 29, 2013

closes #4080

essentially a variation on #3970

here block consolidation and cache is not the issue,
rather the dtype of the series is int, and as a result of the update (directly on the series)
changes dtype to float (which is fine)

BUT the parent still has this in an Int Block

this PR essentially creates an Observer Pattern thru a weakref
that allows notification of chanes (propogated back to the BlockManager)
which can then take action if needed

@jreback
Copy link
Contributor Author

jreback commented Jun 29, 2013

@wesm reprise of #3970, no other way around this as far as I can see

@@ -666,11 +667,17 @@ def _get_item_cache(self, item):
values = self._data.get(item)
res = self._box_item_values(item, values)
cache[item] = res
res._cacher = (item,weakref.ref(self))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i envy your ability to see this solution so fast

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah this was basically my solution on the other issue (which took a while to figure out); but was replaced with a slightly different form of considation (I this one is actually better though)

@wesm
Copy link
Member

wesm commented Jul 2, 2013

Sigh, more signs that pandas is due for a major internal refactor

@jreback
Copy link
Contributor Author

jreback commented Jul 2, 2013

yep....you want to include this or wait till 0.13 (where even after Series refactor I think this still might be needed)?

…ing in parent frame

DOC: update release notes
@jreback
Copy link
Contributor Author

jreback commented Jul 5, 2013

@wesm ?

@wesm
Copy link
Member

wesm commented Jul 5, 2013

Hold on just a minute here. The Series dtype is changing in place?

I'm seeing worse behavior on my machine:

In [8]: df
Out[8]: 
       val
a b c     
1 1 1    0
2 2 2    0
3 3 3    0

In [9]: df['val'].update(s)

In [10]: df
Out[10]: 
                       val
a b c                     
1 1 1                    0
2 2 2  4607182418800017408
3 3 3                    0

In [11]: df['val'].values
Out[11]: array([                  0, 4607182418800017408,                   0])

Don't approve of hacking around this issue or modifying an ndarray's dtype in place.

@jreback
Copy link
Contributor Author

jreback commented Jul 5, 2013

your example shows the current behavior (this with PR)

In [1]: df = DataFrame(dict((c, [1,2,3]) for c in ['a', 'b', 'c']))

In [2]: df.set_index(['a', 'b', 'c'], inplace=True)

In [3]: s = Series([1], index=[(2,2,2)])

In [4]: df['val'] = 0

In [5]: df
Out[5]: 
       val
a b c     
1 1 1    0
2 2 2    0
3 3 3    0

In [6]: df['val'].update(s)

In [7]: df
Out[7]: 
       val
a b c     
1 1 1    0
2 2 2    1
3 3 3    0

In [8]: df['val'].values
Out[8]: array([ 0.,  1.,  0.])

in place is ok for int->float, but doesn't work if its an object array (see #3217), but that's not an issue here

@wesm
Copy link
Member

wesm commented Jul 5, 2013

I don't understand why it needs to upcast?

@jreback
Copy link
Contributor Author

jreback commented Jul 5, 2013

in this particular case it doesn't in thery, but

and this is contrived too....if we could avoid the direct series updateing (df['val'].update(...))
then I think would be ok

In [1]: df = DataFrame(dict((c, [1,2,3]) for c in ['a', 'b', 'c']))

In [2]: df.set_index(['a', 'b', 'c'], inplace=True)

In [3]: df['val'] = 0

In [4]: s2 = Series([1.5],index=[(2,2,2)])

In [5]: df['val'].update(s2)

In [6]: df['val']
Out[6]: 
a  b  c
1  1  1                      0
2  2  2    4609434218613702656
3  3  3                      0
Name: val, dtype: int64

In [7]: pd.__version__
Out[7]: '0.11.0'

@jreback
Copy link
Contributor Author

jreback commented Jul 5, 2013

@wesm I guess let's push to 0.13....assume you want to get 0.12 out the door....:)

@wesm
Copy link
Member

wesm commented Jul 5, 2013

ok i'll push to 0.13/0.12.1

@jreback
Copy link
Contributor Author

jreback commented Jul 25, 2013

closing in favor of #3482 which addresses this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: series update
3 participants